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Robust estimators of location and dispersion are often used in the 
elliptical model to obtain an uncontaminated and highly representa- 
tive subsample by trimming the data outside an ellipsoid based in 
the associated Mahalanobis distance. Here we analyze some one (or 
A;)-step Maximum Likelihood Estimators computed on a subsample 
obtained with such a procedure. 

We introduce different models which arise naturally from the ways 
in which the discarded data can be treated, leading to truncated or 
censored likelihoods, as well as to a likelihood based on an only out- 
liers gross errors model. Results on existence, uniqueness, robustness 
and asymptotic properties of the proposed estimators are included. 
A remarkable fact is that the proposed estimators generally keep the 
breakdown point of the initial (robust) estimators, but they could 
improve the rate of convergence of the initial estimator because our 
estimators always converge at rate n^^^ , independently of the rate of 
convergence of the initial estimator. 

1. Introduction. Between the methodologies to produce robust and ef- 
ficient estimators we are here concerned with those based on a preliminary 
robust estimation followed by one step (or k steps) that improves efficiency 
without a significant loss of robustness. In a natural way this leads us to 
search for an uncontaminated and highly representative subsample, selected 
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using the initial estimation, and then to make the improvement step on the 
basis of this subsample. These ideas are present, for example, in Rousseeuw 
and van Zommeren [25] or in Lopuhaa and Rousseeuw [19], where it is shown 
that some ways of one-step reweighting preserve the breakdown point (BP) 
of the initial estimators. This scheme seems to be particularly adequate to 
preventing gross errors under a model based on a main data stream of an el- 
liptical distribution. The robust estimates are then used as a diagnostic tool 
to select the good observations, such as those at an adequate (Mahalanobis) 
distance from the location estimate. Hence, we could improve the efficiency 
of our estimators, preserving robustness with respect to outliers, by resort- 
ing to classical methods which obtain efficient estimates but compute only 
over the observations considered good. 

In this brief description three main ingredients require consideration: 

• The choice of the robust initial estimator to produce the zone of good 
observations, that is, a suitably trimmed set. 

• Once such a zone has been selected, how to treat the discarded (trimmed) 
data. 

• How to choose the efficient estimator. 

The first item has received considerable attention in the elliptical model, 
which allows us to exploit the symmetries in order to handle gross-errors as 
points far away from the center. Under the usual equivariance requirement 
these robust estimators include well-known proposals like the Minimum Vol- 
ume Ellipsoid (MVE), the Minimum Covariance Determinant (MOD) or, in 
general, S-estimators (see the book by Maronna, Martin and Yohai [22] for 
a discussion of these and further estimators in this setup). 

The other items have been treated in an unequal way. Usually the one- 
step consists of reweighted least squares statistics based only on good sam- 
ple data, which take advantage of the elliptical symmetry of the underlying 
family (see, e.g., Lopuhaa [18]). Some versions, as in Gervini [12], exploit 
even the possibility of selecting the good sample data region in an adap- 
tive way. A different point of view (see, e.g., Bickel [2] or Davies [8]) resorts 
to a Newton-Raphson step to increase the rate of convergence of the ini- 
tial estimators. Curiously, the maximum likelihood estimator (MLE), being 
a natural choice in order to get the maximum gain in efficiency, has only 
recently been considered in Mayo-Iscar [23] in the mixture model, and in 
Marazzi and Yohai [20] in the context of regression of a real valued vari- 
able. However, under simple truncation (which is the approach followed in 
[20]), existence of the corresponding MLE in any data configuration is not 
guaranteed, so breakdown of the estimator could arise under contamination 
leading to such a configuration. 

Here we will mainly address the last two stressed items. The starting point 
will be that of a given (trimmed) set of the sample space, obtained through 
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any of the already enumerated methods and that is consistent under the 
subjacent model. The consistency of the mentioned methods is well known 
(see, e.g., Davies [6, 7] and Butler, Davies and Jhun [4]) under the elliptical 
model, but has not been treated in a contamination model like our GEM 
proposal below. Since the MVE is better adapted to this contaminated model 
and it is the more impacting possibility, even in the uncontaminated model 
because of its low efficiency, this estimate will act as a leitmotif in the paper. 
Its consistency is shown in Proposition 3.1. 

With respect to the second item, first we will consider the likelihoods asso- 
ciated to the (artificially) truncated and censored models given the set, but 
we will also introduce a model of gross errors contamination and consider 
the associated likelihood. In every case, in connection with the third item, 
we will consider the MLEs. The relations between the MLE associated to 
each likelihood model provide a novel approach to interpret the presence of 
gross errors. Under our model we could adapt the final estimation through 
a second step, based on a cut-off parameter, in a similar way to that intro- 
duced in Gervini and Yohai [13] (improved in [20]) or in Garcia-Escudero 
and Gordaliza [11]. However, in order to make the comprehension of the 
methodology easier, we will not consider here these adaptive ways of enlarg- 
ing the MVE to improve the final efficiency of the estimator, although the 
corresponding analysis would be parallel to the one developed here. 

An important feature of our approach is the rate of convergence of the 
final estimator. As a distinctive fact with respect to the known one-step 
reweighted estimators, our estimators converge at rate n^^^, independently 
of the rate of convergence of the initial estimator, whenever it is consis- 
tent. This allows any consistent initial estimator to be considered, even the 
MVE that converges at 'n}/^ rate, as an initial estimator without loss in the 
rate of convergence. This happens because, although based on only a part 
of the sample, our second step is a genuine MLE, so it is able to make a 
full reconsideration of the initial estimation. On the contrary, this is not 
possible if we only make a linear estimation based on reweighting in accor- 
dance with the initial estimation. These considerations agree with those in 
Rousseeuw [24] or He and Portnoy [14], where it is stressed that the problem 
is reweighting. However, the estimators considered in the already mentioned 
literature as one-step improvements to get the n^/^-rate of convergence are 
based on a Newton-Raphson adjustment (see also Jureckova and Portnoy 
[16] or Jureckova and Sen [17]). In fact, in [20], it is even suggested (see Re- 
mark 2 after Theorem 3 there) that the rate of convergence of the truncated 
MLE could be the same as that of the initial estimator. 

The paper is organized as follows. In Section 2 we introduce the models 
and the estimators to be studied, and analyze the identifiability of the mod- 
els. Moreover, we discuss the existence and uniqueness of the MLEs under 
the truncated, the censored and the gross errors models (GEM). We stress 
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the fact that truncation can produce the nonexistence of the MLE, and 
hence produce the breakdown of the estimator; thus, we introduce natural 
restrictions which guarantee the existence of the MLE under truncation. In 
this setting we obtain new results, even in the univariate case, which include 
the consideration of exponential families. Section 3 is devoted to studying 
the robustness and the asymptotic properties of the proposed estimators, 
including the BP and the influence function (IF), as well as the consistency 
and asymptotic normality at rate n^/^. In Section 4 we present our conclu- 
sions on these estimators. The paper ends with an Appendix containing all 
the proofs and some technical results. 

The estimation will be carried out on the basis of the data set X = 
{xi,X2, ■ ■ ■ ,Xn}. In the asymptotic results we will assume that X is ob- 
tained from n independent, identically distributed R^-valued random vec- 
tors Xi,X2, . . . , Xn- Pn will be the associated sample distribution. The usual 
norm in W will be denoted by || — ||, the cr-field of Borel sets will be (3^ and 

will be the Lebesgue measure. When we use matrix notation, vectors must 
be understood as column vectors. For a matrix H = (hij), will denote 
its transpose and \H\ its determinant. B{m,r) [resp. S{m,r)] will denote 
the open ball (resp. sphere) of radius r centered at m, while is the com- 
plement of the set A, and Ia is its indicator function. Further notation will 
be introduced throughout the paper as necessary. 

We will make use of a generic w in a probabilistic space of reference; almost 
surely (a.s.) statements must be understood as relative to that space. 
would be then a realization of Pn- Integration of a random variable h with re- 
spect to a probability P will be denoted as P/i (and is interpreted component- 
wise when his a. vector). We will use matrix notation for partial derivatives of 
a function. Given 5 : 9 x $ x RP ^ M, where 9 C M"*, $ C M'', ^c/(6', (p, x) will 
denote the d-dimensional vector with components -^g{9, (f),x),i = 1, . . . ,d. 

In our setup the choice of an initial set leads us to consider ellipsoids 
indexed by the set F := x M.pxp x 1^^, where A4pxp is the set of positive- 
definite symmetric p x p matrices. For 7 = (/i,S,r) S F, we will denote 
£(^) := {xeW:{x- fif^-^{x - ^t) < r^}. 

2. Maximum likelihood estimation with a trimmed sample. We begin 
with the minimal assumptions that we consider throughout this paper. 

Definition 2.1. The elliptical model associated to the nonincreasing 
function g : M"'" is a family {¥g : 9 G 9} of probabilities on (3^ with 

densities fg (with respect to A^) given by 

(2.1) feix) = |S|-i/2^((x - f,fj:-\x - /.)), 

where 9 := M*' x Mp^^p, and 9 := (/.f, S) G 9. Note that Mp^p, considered as 

a subset of r( 2 )^ is open, fi is the mean of ¥g when it exists, while, if the 
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second moment is finite, the variance-covariance matrix of Pe is proportional 
to S. 

We will also handle a contaminated version of this model: The only gross- 
errors outliers. This term is used to indicate that ¥g could be contaminated 
with a small proportion of data coming from a distribution whose support 
is external to a central part of Pg. Throughout, a central part of an elliptical 
distribution must be understood as an ellipsoid <S(/x, S,r) where and S are 
the location and scatter parameters of the distribution. These sets have the 
nice property of being scaled versions of the MVE; see Lemma A. 3 (but also 
of the MCD, see Butler, Davies and Jhun [4]). Recall that an ellipsoid A is 
an MVE o/P, if¥{A) > 1/2 and the volume of A is minimal in the class of 
all ellipsoids with this property. 

Existence and uniqueness of MVE's are discussed in Davies [7]. As stated, 
if P(^,s) belongs to the elliptical model, then their MVE's are essentially 
given by ellipsoids £{fi, S, r), where r depends on ^ and S but also on g and 
p. In particular, when g is strictly decreasing the MVE is unique. Below we 
describe our version of the GEM that is considered from now on. 

Definition 2.2 (Gross error model). A distribution P belongs to the 
Gross Error Model associated to the family {¥g : 6 € 0} if there exist vr € 
[0, 1), a probability Q and G G, such that 

(2.2) P=(l-7r)Pe + 7rQ, 

where Q is a probability distribution such that if A is any MVE of P, then 
Q{A) = (whence A is also a central part of Pg). 

Our proposal to produce the estimator is this: First, through a consistent 
estimator of 0, we produce an estimation £{jl, S, r) of the MVE of P because, 
even in the GEM, at least asymptotically, the values in the sample that 
remain in the estimated MVE would be produced by the elliptical part of P. 
Then, in order to maintain the BP of the initial estimator (see Theorem 3.1) 
while achieving the highest possible efficiency, we will enlarge the estimated 
MVE by keeping their location, /i, and shape, S, but taking a greater value 
than f to get a scaled (thus containing more points of the sample) version 
of the estimated MVE. As a final step we will construct an MLE of and 
S based on the observations lying in this scaled MVE. 

This program essentially coincides with that introduced in [20], although, 
to make the exposition easier, we will focus on the one-step estimator based 
on the (nonenlarged) MVE. However, in Tables 1, 2 and 3, in Appendix B, we 
will present the gains in efficiency attained by handling the scaled versions. 
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2.1. Identifiability of the model. Since we will discard the data not in- 
cluded in an ellipsoid, we need to assure that the parameters of an elliptical 
distribution are identifiable from every ellipsoid or, more generally, from ev- 
ery open set. This holds in the Gaussian case and other general models like 
the exponential family, as we show in Proposition 2.1. 

Proposition 2.1. Let {fo : 9 G G} be the density functions of a d-parameter 
exponential family with respect to a a -finite measure A onW, where 



Assume that the Qfs do not satisfy a linear constraint on <I>. Let A £ P'p 
such that the T^s X-a.s. do not satisfy a linear constraint on A. 
If fdi = /6»2 A-a.s. on A, then 6i = O2. 

The identifiability shown in Proposition 2.1 can be circumvented because 
we only need to guarantee that each probability is identifiable from the 
rest over adequate sets. Broadly speaking, we can say that a set A is adequate 
for if /i G ^ and fg is not constant on A. 

Since g is nonincreasing, to assure that g is neither constant on A nor 
on affine transformations of A, a natural hypothesis is to assume that g 
is strictly decreasing. But, since the ellipsoids of interest (asymptotically) 
contain /u, it is enough to demand that g is strictly decreasing near zero, or, 
more generally that g fulfills the following condition: 

(Gl) There exists a strictly decreasing sequence which converges to 
zero, such that g{tn) < g{tn+i) for every n. 

Proposition 2.2. Let {Pg : g 9} he the elliptical model onW, p>l 
associated to g which verifies condition Gl. Let Oq = (^Oi^^o) S and A he 
an open set in WP , such that ^lq^ A. If 9 G @, 9 9q, then 



2.2. The estimators. Once we know that it is possible to estimate the el- 
liptical part from adequate sets, we will analyze some estimation procedures 
related to this task. In Section 2.2.1 we will consider the MLE associated 
to (in our case artificially) truncated or censored samples. In Section 2.2.2, 
under the GEM, we design a new estimator, called the Smart estimator. 
In every case their effective computation can be implemented through the 
EM algorithm. Section 2.2.3 explores the existence and uniqueness of these 
estimators. 



(2.3) 




Fg,{xeA:fe{x)^kfe,{x)}>0, 



for every k > 0. 
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2.2.1. Censored and truncated estimators. The difference between trun- 
cated and censored estimators lies on tlie way in whicli they consider the 
discarded points in the sample. The censored one forgets the right values of 
the points outside A, but takes into account their number. Thus, this number 
should appear in the likelihood function but related to no specific point. To 
this, we introduce an artificial point c, not necessarily in M.P, which is only 
used to count the number of censored points. Thus, the objective function 
is the censored log-likelihood: 

(2.4) L'g/^{x):=lA{x)logfe{x)+I{,}{x)logFe{A^), xGMPu{c}; 

here the points in A'^ are treated as located on the identical censored state 
c ^ A, and x oo is taken as 0. This log-likelihood corresponds to the model 
{F'e,A : ^ e 0} given by Pg^^(5) = Fe{BnA)+ Pe(^'^)/ij(c), 5 G U {c}), 
with density function f^^ix) = fe{x)lA{x) + Pe(A^)/{c}(a:),x e U {c}, 
with respect to the measure + ^{c}) where is the Dirac measure on c. 

Obviously the MLE based on i/^y^ is more stable than the usual MLE 
in presence of some contamination in A^. However, this contamination can 
produce an excessive weight on A^. To protect against this possibility, we 
can consider truncation, which does not consider at all the data in A^, the 
truncated log-likelihood being 

(2.5) 4/^(^):=j^(^)log^, xGM^. 

This corresponds to the model {Pg a'-^ ^ ®} defined through the density 
functions fgAi^) ~ lA{x)fe{x)/F0{A),x E with respect to A^. In agree- 
ment with the obvious incompatibility which would arise for those 0's such 
that Fq{A) = 0, we adopt the convention that L^qja = "OO if this happens. 

Thus, given a sample X = {xi, . . . the maximum likelihood censored 
{resp. truncated) estimator, MLE(c) [resp. MLE(t)] on an appropriately 
chosen set A will be the value ^c,n (resp. Of^n) maximizing PnLg^^^ (resp. 

2.2.2. A data based choice: The smart estimator. We present an estima- 
tor which takes full advantage of the GEM. As far as we know, likelihood- 
based estimators under this model have not yet been proposed. 

We have to face two difficulties: There is not a unique set A related to 
the model and, given an observation in A'^, we do not know whether this 
observation comes from ¥g or from Q. To circumvent the first difficulty, 
given a sample of size n, for every suitable set A, we can consider the log- 
likelihood of Til (resp. 712) data points in A (resp. in A^) arising from and 
713 = n — ni — ri2 from the contaminating source also in A"^. For the second. 
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we consider a model in which we have the complete information for the data 
in A and only the global (n2 + ^3) number of points in A^. 

Thus, we have to make a first estimation to get a suitable set A, which 
can be understood as a noise parameter in the model, and realize the final 
estimation on the basis of the likelihood associated to this empirical set. In 
analogy with the censored likelihood in (2.4), we consider an ideal censored 
state c and, for every x G U {c}, define the log-likelihood 

Ll^/Ai^) ■.= lAix)logi{l-7T)fe{x)) 

(2.6) 

+ /{e}(^)log((l-vr)P,(^^)+7r), 
which is associated to the model {P| ^ ^-.6 £ @,7r £ [0, 1)} given by 

(2.7) n,n,A(B) = (1 - vr)Pe(i? n ^) + ((1 - 7r)¥e{A') + 7t)Ib{c), 
where B G o"(/?p U {c}). The sample objective function to maximize is now 

(2.8) PuL^^/A = Pn{lA log((l - 7r)/e) + Ia^ log((l - 7r)Pe(A^) + vr)), 

under the restriction vr G [0, 1). The estimator obtained by maximizing this 
objective function will be called smart MLE [MLE(s)]. The analysis of the 
existence of this estimator, to be carried in the next subsection, will shed 
new light on our proposal for this problem. 

2.2.3. On the existence and uniqueness of the estimators. The existence 
and uniqueness of the MLE is not an easy problem. In fact, the truncated 
normal model is often used as an example of possible inexistence of the MLE. 

For the elliptical model, Maronna [21] treated the problem of existence 
and uniqueness of M-estimators, for the model and the sample, but his 
assumptions on g are not satisfied, for example, by the normal model or by 
our models related to truncation or censoring. 

Under the theoretical model both facts are an easy consequence of Jensen's 
(strict) inequality and the identifiability. The proof is similar to the classic 
one. 

Proposition 2.3. Under the hypotheses of Proposition 2.2, for every 
6 ^9q, we have 

(2-9) ^eQ,AL'B^/A>^eoA^'e/A 
and 

(2-10) ^\-^,A^\ja>^\q,A^^9/A^ 

and, for every {9,tt) G 6 x [0, 1) - {{6o,tto)}, 

(2-11) 1^00,71-0,^-^60, TToM > ^eo,7ro,A-^e,7r/A- 
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To obtain the existence of the MLEs, we should avoid, for a sample in 
general position, a degenerated (into a lower dimension) solution. This is 
related to the speed of decreasing of g and leads us to introduce the following 
assumption Gp. Moreover, we will impose the continuity of g as another 
natural requirement: 

(Gp) If p > 1, then there exists 7 > p/2 such that limr^oo r'^gif) = 0. 
(G2) g is continuous on M"*". 

Note that, by Scheffe's lemma, G2 implies that 

(2.12) sup \re„{A) - Fe,{A)\ ^ 0, whenever On ^ ^o- 

Proposition 2.4 (Existence of nonrestricted MLE). Let g be a function 
which defines an elliptical family on W and satisfies Gl, G2 and Gp. Let 
n>2, if p=\, and let n > ^^^y^ in the case p> 1, where 7 is the constant 
which appears in Gp. 

Then, for every data set X = {xi, . . . , whose points are in general 
position, there exists (/t, S) G such that 

n n 

n Aa,s)(^«) - Wf<,^i,T.){xi), for every (^,S) G 9. 

i=l ' i=l 

The next proposition proves the existence of the smart and censored es- 
timators. 

Proposition 2.5 [Existence of MLE(s) and MLE(c)]. Assume that g 
defines an elliptical family on W and satisfies Gl, G2 and Gp, and let 
X = {xi , . . . , Xn} he a data set. 

Let A (3^ such that the number of points, m, in the set X OA satisfies 
that m > 2, if p= 1, and that m > ^^J^^ in the case p > 1, where 7 is the 
constant which appears in Gp. 

If the points in XCiA are in general position, then there exist the MLE(s), 
{0s,n,T^n), cLnd the MLE(c), 9c^n, based on the sample X and A. 

The existence of the MLE(t) cannot be shown with the same argument 
because the denominator Tq^{A) in (2.5) could converge to zero. In fact, 
this can lead to nonexistence of the MLE(t). This difficulty can be handled 
on the basis that the sets A under consideration will be estimations of the 
MVE of P, thus their probabilities must be large enough. 

Given a > and the ellipsoid A, let 



(2.13) 



Ql:={eee:¥e{A)>a}. 
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Assume that P is a probability obtained by contaminating Pg^ by any proba- 
bility Q with Q{A^) = 1, and Fg^^A) = ao- We will obtain in Proposition 3.2 
that, asymptotically, if a G {0,ao), the restrictions obtained from the 
sample MVE, An, are satisfied by every in a neighborhood of ^o- More- 
over, as stated below, the truncated likelihood function constrained to the 
set has a maximum. These facts allow us to consider the MLE(r) or 

constrained MLE{t), to be denoted substitute of the MLE(t). 

Proposition 2.6. Given a > 0, let 0^ he defined as in (2.13). Let us 
assume the hypotheses in Proposition 2.5 for g, X and A. 

If the points in X CiA are in general position, then there exists 9r,n G ©l? 
such that 



PnL\ = sup P„L 



Dependence of the constrained solutions on the a-value could be consid- 
ered as a drawback of this proposal. However, as shown in the next proposi- 
tion, in our setup the level defining the restriction will arise in a natural way, 
justifying our considering the MLE(r) for a = Pn{A) as a natural MLE(r). 

Proposition 2.7. Let us assume the hypotheses in Proposition 2.5. 
Let {9s,n,T^s,n) be an MLE(s) and let us define, for every 9 

(2.14) !£(rl)z^. 

If ^s,n = 0, then 9s,n is an MLE(c). 

If T^s,n > 0, then TTs^n = 'T^*{9s,n) o,nd 9s^n is an MLE(r) restricted to 0^, 
fora = Pn{A). 

The key to compare the proposed estimators is the MLE(t) when it exists 
(see Theorem 2.1 and Proposition 3.2). From the arguments in the proof 
of Proposition 2.7, if 7r*(i9t,„) > then {9t,n,T^*{9t,n)) would be the MLE(s), 
while if TT*{9t^n) < 0, then the maximum of Pn-^l , . on [0, 1] is obtained for 

vr = 0, so the solution given by the MLE(c) and tt = would be preferable. In 
other words, in spite of the MLE(c) always existing, under the assumptions 
in Proposition 2.5, it is only preferred when the MLE(t) produces troubles, 
either because the MLE(t) does not exist or because the associated estima- 
tion of vr (given by nt^n) is negative. But the MLE(t) only takes into account 
the data inside A, thus the troubles appear either because they are not likely 
enough to arise from the elliptical distribution, or because this estimation 
leads us waiting on more sample data outside A. Proposition 2.8 highlights 
these facts. 
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Proposition 2.8. Assume the assumptions and notation of Proposition 
2.7 and that there exists an MLE(t), 6t,n- Then 'K*{6t.n) > implies that 
{6t,m'T^*{Qt,n)) is an MLE(s). Otherwise {9c,m0) would be an MLE(s). 

It was precisely this behavior that led us to give the name "smart" to our 
estimate, in order to stress this suggestive property of choosing between two 
estimators. Whenever we make reference to the global problem, including 
the estimation of the contamination level, we will also use smart estimate 
to refer to the pair {Og^m'^n), where 7r„ is defined as 7r*(0s,n) in (2-14) when 
it is feasible and as otherwise. 

We also stress that under the GEM the consistency of the MLE(s) will im- 
ply that TTt^n is positive for large n, so that {9t,n,T^* {0t,n)) will asymptotically 
produce the smart estimator. 

Uniqueness of the MLEs in our different schemes is a very distinct task. 
In any case, it should be noticed that the uniqueness of the estimators them- 
selves is not necessary to obtain results on their asymptotic behavior or even 
their BP. In general, the treatment of the uniqueness of the MLE is closely 
related to the exponential family (see [1]), and this is also our approach in 
Theorem 2.1. 

Theorem 2.1. Let {fg : 9 G Q} be the density functions of a d-parameter 
exponential family with respect to a a-finite measure A on given by (2.3). 

Let A£(3P and P be any probability on W such that F{A) > and ¥{\TjLA\) < 
CO, j = 1, . . . ,d. Assume that neither the T^s on A (P-a.s.), nor the Q^s on 
@ satisfy a linear constraint and let PLg be the expected log-likelihood, 
under F of (2.6). 

Then, there exists at most one solution for the maximization of PL^ 
under the restriction vr > and there exists at most one solution for the 
maximization ofFL^^^. Moreover, if there exists a solution constrained to 

0^, 9, which verifies F^{A) > a (i.e., it is not in the boundary of Q'\), then 
it is unique and also solves the unconstrained problem. 

As a consequence of Theorem 2.1 and Proposition 3.2, we can assure that 
the MLE(t) exists asymptotically and that it is unique for the exponential 
family. The following corollary particularizes this for the normal family. 

Corollary 2.1. Let {¥g:9 £ Q} be the normal p- dimensional family, 
and let A be any bounded set whose interior is nonempty. If X = {xi, . . . 
is a data set such that X D A has at least p + 1 points which are in general 
position, then there exists a unique smart estimator (vr„,0s^„), at least there 
exists one MLE(c), and at most there exists one MLE(t) based on A. More- 
over, for every a G (0, 1) there exists an MLE(r). In particular, there exists 
a natural MLE(r) [corresponding to a = Fn{A)]. 
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2.3. Information matrices. This section ends with the computation of 
the information matrices of the proposed estimators. Those results, under 
the hypothesis of regularity of the model, will be employed in Theorem 3.3 
to obtain the asymptotic distributions. 

Regularity of a statistical experiment demands the following (see, e.g., 
page 65 in [15]): (a) continuity of the densities fo{x) on 6 for A^'-a.e. x; 
(b) Fisher's finite information at every 9 £ Q [i.e., differentiability of the 

1/2 

function fg (•) in L2(A^) at every point 9 G 0], and (c) continuity in the 
space L2(AP) of this differential function for every G 0. 

In order to guarantee the regularity of the elliptical model, we could resort 
to the minimal conditions given by Bickel (see pages 96-98 in [3] ) , consisting 
in the absolute continuity of g and the finiteness of the integral 

J\p+\l+r^)(^^'^\r^)g{r^)dr. 

Under the regularity of the statistical experiment. Lemma 7.2 in [15] shows 
that for any function T, such that PgT^ is bounded in a neighborhood, Vg^ , 
of ^0 £ the function 9 ^e{T) is continuously differentiable in Vg^ and 

(2-15) ^e(T(-l-\og{fo)]]=-l-MT). 



M ^"-""JJ 89 

In particular, we have P£)(^ log(/e)) =0. 

These relations and easy computations (we omit), which take into account 
facts as 

^ ■ log ^0 {A) = = -1-P, (lA'^' 



39 ^ ' Pe(^) Pe(^) "K'^ h 
lead to the following propositions on the information matrices of our models. 
Notice that (except in Proposition 2.10) the involved results do not depend 
on the elliptical hypothesis. Proposition 2.9 also relates the information ma- 
trices based on the original, the censored and the truncated models, which 
we respectively denote by I{9),Ic{9, A),It{9, A). 

Proposition 2.9. Under the regularity of the model {Fg : 6* G 0} defined 
by the density functions {fg :9 G 0}, the information matrices corresponding 
to the censored and truncated likelihood functions based on a set A verify the 
relations 

(2 16) 1(9 A) -F ( (^^^')(^^^'Y) (i^^(-4))(fP,(^))^ 
Ici9,A)=¥e{A)Iti9,A) + 



(9-^Fg{A)){fgFe{A)f 



(2.17) 



Fg{A){l-Fg{A)) 

:I{9)-FgiA')It{9,A'=). 
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Remark 2.1. The information matrices above are obtained from dif- 
ferent probability models. However, in our setup, censoring or truncation 
are artificial. This means that, in fact, we will know the size of our data 
sample, and thus, truncation must be understood as a way of handling the 
data outside the trimming set, but not as a way of wrongly reconsidering 
the data size. Therefore, we must take into account the original data size for 
a correct analysis of the information given from a complete sample through 
both procedures. 

This leads to the consideration of either the conditional (to the number 
of observations that belong to A) truncated information, or the expected 
truncated information. In the first case, we would associate the information 
matrix /c„Xj(0,yl) to a sample of size n with kn elements in A, while in the 
second we should associate that given by nFg{A)It{9,A). This last point of 
view means, in fact, that in our model of complete data the truncated infor- 
mation should be It{0,A) =Fg{A)It{9,A), leading (2.17) to the equivalent 
relation 2^(9, A) = Z[9) —If{6,A^). Of course, the Law of Large Numbers 
guarantees that both definitions give the same asymptotic value. 

It should be also stressed that the information obtained with censoring 
is ever greater than that expected with truncation, as trivially arises from 
(2.16). 

Moreover, (2.16) and Proposition 2.10 also show that in the elliptical 
model, for sets A taken as (scaled versions of) the MVE of ¥g, both in- 
formation matrices coincide for all the parameters related to the location 
and shape of the distribution, and only differ for the scale parameter. In 
other words, if we reparameterize S as H = ,c;'^ = jSl^/^', for the 

(scaled versions of) the MVE of the elliptical probability ¥g, the only dif- 
ferent component in the information matrices Idd^A) and I* {6, A) is that 
corresponding to the scale parameter ? analyzed in [11]. 



Proposition 2.10. 



Assume regularity of the elliptical model {Fg : 9 G 

^2 



O}. Let S be reparameterized 6|/ H = c;^ = Then, for every 

9o = (/^Oj ^o) £ o.iT'd every r > 0, the following relations hold: 



d_ 

djjL 



Pe(£:((;U,S),r)) = 0, 



d_ 



Fg{£i{fio,^o),r)) = 0. 



In the GEM, the information matrix Xs(r/, A), where r] = {7r,0), is com- 
posed of a sub-matrix corresponding to the parameter 9, a term correspond- 
ing to vr and p{p + l)/2 terms (i.e., the same number as the dimension of 
9) corresponding to the cross terms between and vr. We will, respectively, 
denote them by Is{9,A), Is{Tr,A) and Ts{9^,7r,A). 
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Proposition 2.11. Under the regularity of the model defined by the 
density functions {fe'.O G 0}, the information matrix for the GEM (2.7) 
verifies 

U9, A)-{1- vr) [Fo{A)Ue, A) + ^^-^^^^^^-^^—^^-^ j , 

(2.18) Is{7T,A) 



(l-vr)(l-(l-7r)Pe(A))' 
Ue\Tr,A) = i = l,...,p(p + l)/2. 

Remark 2.2. Since we will often be interested only in the 0's param- 
eters, it is natural to explore what is the information for Q in the GEM 
(2.7), treating vr as a nuisance parameter. According to the well-known block 
matrix form of matrix inverses, the block of the inverse matrix of Xs{r]^A) 
corresponding to the 6''s parameters can be expressed as 

(2.19) (XAO,^) i!£(4) 



l-(l-ir)Pe(A)^ " ' " l-{\-iT)Ve{A). 

hence, the matrix between the great parentheses is considered as the infor- 
mation for 0. 

From (2.18), it is straightforward that this information coincides with 
(1 — 7r)P6i(A)2t(0, A). This agrees with our considerations in Remark 2.1 
and the second item in Proposition 2.7: Taking into account that the trun- 
cated model associated to the GEM on A coincides with the truncated model 
(on A^) associated to the uncontaminated model {Pg : Q S 0}, the informa- 
tion matrix for the GEM must coincide with that of the truncated model 
corrected through a suitable factor. The expected number of sample data 
points in A obtained from a random sample of size n from the contaminated 
probability Pg^ = (1 — 7ro)Pe() + 7ro(5, where Q is any probability with support 
in A'^ is precisely n(l — 7ro)P0o (A), thus, the truncated information obtained 
from one observation from the original GEM should be 

(2.20) Xr(^o, A) = (1 - 7ro)Peo(A)2:t(^o,^). 

This also supports that the MLE(s) coincides with the MLE(t) asymptoti- 
cally. 

3. Robustness and asymptotics of the estimators. In the finite sample 
setting, the robustness of an estimator is usually measured through its (fi- 
nite sample) BP, which for an estimator T„, based on a sample will be 
denoted as e*(r„,A'„). Of course, the BP has no sense if we are only able 
to assure the asymptotic existence of an estimator. In fact, its analysis is 
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closely related to arguments on the existence of the estimator. In our case 
we have shown in Propositions 2.5 and 2.6 the existence of the MLE(s), 
MLE(c) and MLE(r) under very general hypotheses. If our initial estima- 
tor is equivariant, the ellipsoid on which we base our ML (final) estimation 
will be also equivariant and the whole procedure will obviously maintain 
the equivariance property. But, as stated in Theorem 3.1, our one-step pro- 
cedures also preserve the initial BP. In fact, by merging the arguments in 
Section 5 in [19] with those used in the discussion showing the existence of 
our estimators, it is straightforward to show the following theorem. 

Theorem 3.1. Let X = {xi, . . . ,Xn} CW, n>p, be a sample of points 
in general position. Let tn and Cn he estimates of location and covariance. 
Let 

An := {xeW: {x - tnfC-\x - t„) < ci}, 

where c\ is any fixed value such that the set An contains at least 
points of X . 

Lf the hypotheses in Proposition 2.5 are satisfied and 6s,n, Qc,n O'nd Oy^n 
are respectively the MLE(s), MLE(c) and MLE(r) based on An, then 

min{e*(^,,„,;f),e*(4,„,^),e*(i,n,^)}>minK(t„,A'),e*(C„,;f)}. 

In particular, when tn and Cn are the MVE-hased estimators, then 

e*{es,n,X)=e*{e,^n.X)=e*{er,n.X) = [(n-p + l)/2]. 

In order to obtain the Influence Functions (IF) of our estimators, we 
will begin with a fixed ellipsoid A = <f (7), 7 S F, and emphasize on the 
dependence on the parameter 7. In this case the IF's of our estimators can 
be obtained as the IF's of M-estimators. Thus, after Section 2.3, under the 
usual conditions to allow for interchanging differentiation and integration 
[recall relation (2.15) obtained from the regularity of the model], and under 
the assumed model, provided that the involved information matrices are 
nonsingular, we obtain 

(3.1) lF(x,i,„(7),eo) = -(Peo(^^;,7)) '^;o,7(^)' 



where ^*,„(7) = Ot,n{l) or ^c,n(7) and h*Q^^ = /i^ or h^^^, defined by 



V7 - ^^^(7) - ^^(^(^^^(7)- 
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On the other hand, under the GEM, by defining 

/ d 

us / W-/" T , v~ ■■ / at/ - " \- \ I J I J 



^ (l-vr)^Fg(g(7r) 



-1 , l-FeiSj-fY) V 

1 _ ^'^(7) + 1 _ (1 _ 7r)Pe(f(7))^(^)V 



and recalhng the information matrix TsiVi^il)) (see Proposition 2.11), we 
obtain 

(3.2) IF(x,^,,„(7),vr,,„(7),0o,vro) = {Is{m,£{l))r'hl^^^^^{x). 

Because of the continuity of the estimators with respect to 7, it is easy 
to see that the IF of the estimator O^^niln) coincides with that of O^^nil) 
if {7n}n C r and 7„ — > 7 S F, if we apply the main idea in the proof of 
Theorem B.l in [10] to the points that do not belong to the boundary of i5(7). 
Therefore, the IF of the one-step (truncated, censored or smart) estimator 
based on the MVE estimators will be the one given by (3.1), or (3.2) with 
£^(7) being the MVE of P, where F belongs to an elliptical model [or to the 
GEM model given by (2.2) for the elliptical model]. 

Of course, the asymptotic variances computed from the information ma- 
trices Ic{Oo,£{'~f)), TI{9q,8{'^)) and XI*{9q,£{'^)), taking into account Re- 
marks 2.1 and 2.2, and by integration of the square of the relations (3.1) 
and (3.2), coincide. 

3.1. Strong consistency. To explore the asymptotic behavior of our esti- 
mators, we begin with the consistency of the initial estimator. We will show 
that any initial consistent estimator under the model would give the same 
asymptotic behavior. The consistency of the MVE in the uncontaminated 
model has already been treated in [7]. However, under the uniqueness of the 
theoretical MVE, it is not difficult to show the following proposition that 
covers the GEM. 

Proposition 3.1. Let g he a decreasing function which defines an el- 
liptical family on M^. Let he a random sample ohtained from the 
distribution 

(3.3) P=(l-^o)Peo+^oQ 

in the GEM of the elliptical family defined hy g with ttq < 1/2, let A = 
£{n,T,,r) he the MVE of F, which we assume to he uniQue, and A^ — 
iS(/Un, S„, r.„) he the sample MVE. Then we have that lim„/^^ = L^ a.s. 



Now, we are in a position to prove the consistency of the smart estimate 
under the GEM. 
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Theorem 3.2 (Consistency of the estimators). Let g be a function which 
defines an elliptical family on W and satisfies Gl, G2 and Gp. Let {Xn}n 
be a random sample taken from the distribution 



in the GEM of the elliptical family defined by g, < ttq < 1/2 and 6q £ Q. 

Let A be an MVE of P, which we assume to be unique. Let {An}n be 
a sequence of empirical MVE^s and let {(vr^, 5]„)}„ be a sequence of 
MLE(s) based on the ellipsoids {A„}„. Then, the following is satisfied: 

1. The MLE(s) based on {An}n is strongly consistent. 

2. //ttq = 0, then the MLE(c) based on {An}n is strongly consistent. 

3. If a £ (0, 1/2), then the MLE(r), Or^n, based on {An\n under the restric- 
tions given by 0^^^ is strongly consistent. Moreover, if 7r„ is computed 

from n using (2.14), then also 7r„ ^ ttq a.s. 

Proposition 3.2 shows that, for a large enough sample size, the restricted 
parameter set contains the true value of the parameter. Thus, for large sizes 
these restrictions are, in fact, superfluous. 

Proposition 3.2. Assume the hypotheses of Theorem 3.2. Let a G (0, 1/2) 
be given and let 0^^^ be defined as in (2.13). Then, for a.e. sample there ex- 
ists (5 > such that {9 : ||^ — ^o|| < C 0^^, for large enough n. 

3.2. Asymptotic distribution. Although the extension of the argmax- 
based arguments of the Empirical Processes Theory to the semiparamet- 
ric framework is certainly not trivial, our model is well suited for such a 
task, because of the special features of the family of ellipsoids parameter- 
ized through the set P. In fact. Section 3.2.4 of [26] can easily be tuned to 
cover our setup by verbatim repeating the reasoning therein in order to get 
the chain of results on linearization given in the Appendix as well as their 
consequences. 

We only consider with some detail the MLE(r) which needs some addi- 
tional analysis. Let a E (0,1/2), and {jn} be the sequence of parameters 
associated to the sequence of sample MVE's. We initially assume the hy- 
potheses in Lemma A. 7, as well as the regularity of the underlying elliptical 
model. After the consistency results, for the analysis of the asymptotic dis- 
tribution, we can assume that the 7-parameters belong to a compact subset 
K oi T, and that the ^-parameters verify the restrictions given by 0f(^ ) 

and belong to the set {0 : ||0 — 0o|| < ^} for some 5 > and large enough n. 
Let us consider the function mg^^, associated to the MLE(r), given by 



P=(l-7ro)Pe„ + 7roQ 



(3.4) 



me^j{x) ■=Is(^) log 



( 



g((x-/i)^S-i(x-/i)) \ 



/f(,)5((y-/^rs-i(y-^))dy;- 
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Lemma 3.1 allows us to apply Theorem A.l under the condition required 
in Lemma A. 8. 

Lemma 3.1. Let us assume that g is twice continuously differentiable. 
Let 6o = {fiQ, So) G ©, 7o G r &e such that 

inf o((x-^o)^5]o^(2;-^o)) >0. 
xei(7o) 

Then, there exist a vector valued function mg^, 6 > and a compact neigh- 
borhood K of 7o such that 

(3.5) I me,-me -{e-eoffae,, ^ _ ^ ^ ^ ^1 
is ¥-Donsker and 

(3.6) F{me^ - -{6- Oofme^-y)^ = o{\\e - 0o||)', 
uniformly in j £ K. 

If the matrix of second derivatives is continuous and nonsingular, rela- 
tion (A. 24) in Lemma A. 8 and the consistency of the MVE's produce the 
asymptotic laws of the MLE(s) at the announced rate n^/^, independently 
of the rate of convergence of the initial estimator. 

The analogous result for the MLE(c) under the elliptical model follows 
from similar considerations. Finally, under a probability in the GEM of 
the elliptical model with vr > 0, the consistency of the MLE(s) assures, from 
Proposition 2.7, that the MLE(s) coincides with the natural MLE(r) asymp- 
totically. Thus, they share their asymptotic normal distribution, with the 
covariance matrices related to the information matrices already obtained. 
We summarize these results in the following final theorem. 

Theorem 3.3 (Asymptotic distributions). Assume the hypothesis in 
Lemma A.l and that g is twice continuously differentiable. Let A be the 
(only) MVE of¥. For each nGN , consider an estimation, An, obtained 
through a consistent estimator of the MVE; the MLE(r), Or^n under the re- 
striction defined by G^^^ for some a G (0, 1/2), as well as the MLE(s), 9s,n, 

and the MLE(c), Oc^n, based on An- 

If the corresponding information matrices are nonsingular, then: 

1. \/n{6r^n — Go) converges in law to a centered multivariate normal distri- 
bution with covariance matrix given by the inverse of the information 
matrix, I^*{6,A) defined through (2.20) and Proposition 2.9. 

2. //vTo = 0, y/n{Oc,n — Go) convcrgcs in law to a centered multivariate normal 
distribution with covariance matrix given by the inverse of the informa- 
tion matrix, Zc{9,A), defined in Proposition 2.9. 
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3. //vTo > 0, \/n{9s^n~(^o) converges in law to a centered multivariate normal 
distribution with covariance matrix given in (2.19). 

4. Discussion. A consideration on the efficiency of the obtained estima- 
tors can be illuminating. Note that the rate of convergence is always n^/^, 
but also that the asymptotic law of the estimators depends on the limit 
ellipsoid but not on the rate of convergence of the initial estimator to this 
ellipsoid. In fact, from the asymptotic results and the expressions of the 
information matrices, it becomes apparent that the efficiency is equivalent 
to that obtained from the corresponding MLE computed on the theoretical 
(enlarged) MVE. Therefore, it is greater than that obtained by the usual 
one step reweighting, even for initial estimators that converge faster than 
the MVE estimator. 

Under the elliptical model, any high-BP consistent initial estimator C„) 
of ^ = S) could be used to produce our estimation An := £{tn-, Cn, 'rc„,a) 
of the central ellipsoid A = S, ts^q), covering, say, the 1 — a = 95% of 
the theoretical distribution. Between our proposals, the MLE(c) based on 
An will provide maximum efficiency and the same BP as {tn,Cn). We recall 
that, according to Remark 2.1, the gain of efficiency with respect to the 
MLE(t) appears only in the estimation of S, thus, the MLE(t) and MLE(c) 
of /.f based on A„ have the same efficiency. Since it is usual to justify the use 
of robust estimators looking at the behavior under the model (i.e., assum- 
ing the existence of no contamination), the greater efficiency of the MLE(c) 
under the elliptical model would justify its prioritary use. 

In Tables 1, 2 and 3 (see Appendix B) we present the asymptotic effi- 
ciencies of these estimators under the uncontaminated elliptical model and 
their versions based on enlarged MVE estimations. The comparison of the 
efficiencies in these tables with other well-known robust estimators (see, for 
instance, the efficiencies obtained in [5]) or [4] shows that the combination 
"Initial MVE estimator" -|- "Scaled version for a given a" + "MLE(c)" gives 
better efficiencies between the highest-BP equivariant estimators. 

In the contaminated model, it is intuitively sound that the best choice 
for an estimator based on a subsample which contains no outlier should 
be the MLE(t). In this sense the MLE(s) is the natural MLE, because it 
only substitutes the estimation provided by the MLE(t) when it does not 
exist or the sample does not sufficiently match the GEM. In some way it 
also robustifies the MLE(t) that, as already noted, possibly does not even 
exist. This nonexistence is in apparent contradiction with Theorem 1 in [20], 
but the BP studied in this theorem is not the sample-based one and does 
not reflect the possible nonexistence of the MLE(t), which could make it 
undesirable from the robustness point of view. The MLE(r) would be an 
excellent alternative, taking into account the choice of the initial trimmed 
sets. 
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In the presence of outliers, our choice of the MVE as initial estimator to 
produce the trimmed set is related to the GEM model, which is based on 
the possibility of discarding the outliers by resorting to a common central 
ellipsoid of the contaminated and uncontaminated models. Since our pro- 
posals circumvent the drawback of its convergence rate, this choice stresses 
the improvement of efficiency obtained through the presented methodology. 

The literature on robust estimation in the elliptical model usually an- 
alyzes the estimation of |S| a posteriori, by adjusting on the basis of the 
model and the estimates of location and shape. This generally leads to a 
Fisher inconsistent estimation under a real contaminated model, even in the 
considered GEM in which only outliers contaminate the distribution. On the 
contrary, our proposals are in their own right MLE, even for the size of S, 
and only the MLE(c) would be Fisher inconsistent (when ttq > 0). 

In the applications, every proposal can be computed through a variant of 
the EM algorithm (see Section 4.2 in Dempster, Laird and Rubin [9]) and 
based on the improved MVE given in (6.59) in [22]. The variant of the EM 
algorithm can be based on a Monte Carlo approximation to the integrals 
using a random sample from the appropriate elliptical distribution, while 
in the M step we need to solve the estimation problem for the original 
(nontruncated, noncensored) elliptical distribution. 

APPENDIX A: PROOFS AND SOME TECHNICAL RESULTS 
Proof of Proposition 2.1. If /^^ = /g^ A-a.s. on A, then 
xi^xeA:p^{Q,{ei)-Q,{92))T,{x)=log{C{ei)/C{e2))^=X{A), 

and the T's would satisfy a linear constraint on A with A-positive measure. 
□ 

Proof of Proposition 2.2. Let = (/i, S) g 6 be such that, for some 
k, it satisfies 

(A.l) Fe,[Ck]=Fg,[A], 

where Ck = {x £ A: fo{x) = kfg^{x)}. 

Assume that /i / /Uq and let e > such that B{fiQ,e) C A and fo^ > on 
B{fiQ,e). Then e* = inf(||^ — /iollje) > 0. For every x £ S{fio,£*) ^{y-{y — 
H,lio — fi) > 0}, let [x,/xo] C MP be the segment joining x with fiQ. 

The function fg^ (resp. fg) increases (resp. decreases) on the segment 
from X to jiQ. Because of Gl, /^q is not constant on this segment. Thus, if 
we denote A;^ the (one-dimensional) Lebesgue measure on [x,fio], then 

Ai{/e/A:/eJn[x,/io] >0, 
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which makes (A.l) impossible. 

This imphes that fx = fiQ. Moreover, from (A.l) there exists a sequence 
{xn} C Cfc — {0}, such that hm„, x„ = /^o and, since (from Gl) 5^(0+) > 0, we 
obtain 

k = hm ^^(^"^ = (M) ^'^ hm g((^n-/^o)"^^"n^n-Mo)) ^ (\^\ 

n f,,{x^) n 5((x„-/iorSo-l(x„-/Xo)) V|S|; 

In other words, if x G Cfc, we have 

g{{x - \iQYYr^{x - fio)) = g{{x - ^o)^5]o ^(x - /io))- 

On the other hand, let x G A such that foaix) < feoif^o) and let 

= sup{t >0:g{t) > \^o\^/^feo{x)}. 

Because of Gl, tx ^ 0. Moreover, taking into account that A is open and 
(A.l), we have that there exist two sequences {x„}, {i/n} C such that 

(A. 2) lim {xn - /io)^So ^(x„ - hq) = lim (y„ - fioV'^o^iVn - fJ-o) = tx, 

while {yn - Mo)^So ^(y„ - fio) > tx, and (x„ - fioV'^o^ixn - A*o) < tx, for 
every n S N. Therefore, by definition of Ck and tx, it must also happen that 

(A.3) hm{yn- fJ.o)^^~^{yn- fJ-o) =tx- 

n 

Without loss of generality, we can assume that the sequence {yn}n is 
convergent. Let yx be its limit. Thus, from (A. 2) and (A.3) we have that 

(A.4) {yx- fJ'o)^^~^{yx- fJ'o) = tx and [Vx - Hof'^^Q^iVx - fJ-o) =tx- 

However, by Gl, it is possible to choose x in order to obtain infinite 
different values for tx above. This and the freedom we have to choose the 
convergent sequence {yn} give that at most there exists a matrix S which 
satisfies the infinite number of relations included in (A.4). Since Sq satisfies 
all these relations, the only possibility is to have E = Sq. □ 

Proposition 2.4 employs the following lemma in its proof. 

Lemma A.l. Let X := {xi,. . . ,Xn}, where n>p, be a set whose points 
are in general position. Let 7i be the family of all hyperplanes in M^, and 
given H £7i, let us denote the distance from Xi to H by di{H) := inf{||xj — 
h\\:h€ H}, i = l,...,n. 

If (d(i) (H) , d(2) (H) , . . . , d(^n) {H)) is the ordered set of the values, di {H) , i = 
1, . . . , n, then mluciH d(p+i) {H)>0. 
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Proof. Every H is determined by the vector v G ^p-i, the unit 
sphere in M^, and the value 6 G M which satisfy H = {x £W : {x, v) = b}. Let 
us denote H = Hyi,. Also, set Xi = {xj, . . . , x^), for every i = 1, . . . ,n, Mn := 

sup{|x^| : j = 1, . . . ,p and i = 1, . . . ,n} and Tin = {{v, b) £ Sp-i x M : Hy b H 
[-M„,M„]V0}- Obviously, 

inf{d(p+i)(i7) G n} = inf{d(p+i)(//,,6) : {v,b) G 

For every v G Sp-i such that there exists 6 G M, which satisfies that {v,b) G 
Tin, let us consider the continuous maps 

V ^"(u) := sup{6 G W : {v, b) G Hn}, 

V Bniv) := inf{6 G W : {v, b) G Tin}. 

Since Sp— 1 is compact and Tin = UugS ^A"^) ^ [Bn{v), B'^iv)], where we 
take [B"'{v), Bn{v)] = if these maps are not defined, we obtain that Tin is 
compact. 

On the other hand, for every i = 1, . . . ,n, the map {v,b) :— > di{Hy^b) is 
continuous, hence, {v,b) :— > rf(p+i)(-f^i;,fe) is also continuous and reaches its 
infimum on Tin, proving the lemma from the general position assumption. 
□ 

Proof of Proposition 2.4. We wih only consider the more involved 
case p> 1. Let {{fik, ^k)}k C be a sequence such that 

n n 

(A.5) lim[|/(^^,E^)(xi) = sup H 

i=i (M,s)eei=i 

Since g is continuous and g{0) > 0, it must be 

n 

(A.6) liminfn/(M..,Efe)(^0>0. 

1=1 

Let vl, . . . and dj., . . . ,6^he the eigenvectors and eigenvalues of S^. Let 
Afc := inf{ J^, . . . , 6^} and let be such that = 5j!' . First, we prove that 
it is impossible that liminf^ = 0. Let us assume that, on the contrary, 
there exists a subsequence, which we will denote as the original one, such 
that limfc A^ = 0. 

Since the points in X are in general position and, since n > p, we can 
apply Lemma A.l to obtain that there exists d> such that 

Ik:={iG{l,...,n}:\{xi- ^afvf \ > d} 

is a set which contains at least (n — p) elements. Therefore, if i £ Ik, then 

(xi - fikf^^\xi - ixk) >\{xi- ixifvi'^fi^k)'^ > d\Ak)-\ 
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and, since g is nonincreasing, if i£ I^, and \T,k\ > {A^)^, we have that 

n 

(A.7) n fi,.,^d^^) < {A,)-^'''^9Wg{d\A,)-^r-^. 

i=l 

Thus, applying assumption Gp in (A.7), we have that if k is big enough, 

n 
i=l 

which converges to zero as n ^ oo and contradicts (A. 6). 

Now, let A'^ := sup{5^, . . . , 50. Since > A^"^A^ we have that 

n 
i=l 

and, to avoid contradictions with (A. 6), we obtain that limsupj^. lA'^l < oo. 

Because liminffc|Afc| > and limsup^|A'^| < oo, we can conclude that 
limsup;. Il^fcll < oo because, on the contrary, (A. 6) would be false. 

This means that every sequence which satisfies (A. 5) is contained in a 
compact set and, in consequence, it contains a convergent subsequence to, 
say, {fi, E) S 0. An easy argument of continuity shows now that (//, S) is the 
point we are looking for. □ 

Proof of Proposition 2.5. Let {{Ok,TTk)}k in x [0, 1), 9k = {fj,k,^k), 
be a sequence such that 

lim PnLl ^^^lA = sup PnLl ^/A. 

Since [0, 1) is bounded, we can assume that there exists tt = limfcvrfc. 

Taking into account that the second summand in (2.8) is bounded above, 
we can repeat the same reasoning as in Proposition 2.4 to show that there 
exists a convergent subsequence of {9k} whose limit belongs to and also 
that TT < 1. Thus, from the continuity of fe and [recall (2.12)], we obtain 
that the maximum is attained at the limit of this subsequence. 

The proof for the MLE(c) is the same, by keeping vr = fixed. □ 



Proof of Proposition 2.6. This proof goes along the same hues as 
the one we gave for Proposition 2.4 because, under the restrictions, the term 
¥'g^[A) is bounded away from zero. □ 

Proof of Proposition 2.7. The first statement directly follows from 
the expression (2.8) of the objective function, which, for vr = 0, coincides with 



24 



J. A. CUESTA-ALBERTOS, C. MATRAN AND A. MAYO-ISCAR 



that of the censored framework. Concerning the other statement, notice that 
an equivalent expression for (2.8) is 

PnlAlog{fg/Fe{A)) + Pn{lAlog{{l-7T)Fe{A)) 

(A.8) 

+ lAclog(l-(l-7r)P,(A))). 

Let us denote iI:{'k,6) to the sum of the second and third summands 
in this expression (which are the only ones depending on vr). Note that 
ij){'K*{9),0) does not depend on 0. On the other hand, derivation of if) with 
respect to vr easily shows that, for every 0, if vr > and vr > tt*{6), then 
'(/'(vr, 9) is nonincreasing on vr, thus, the maximum value of il^{ir, 9) on [0, 1] is 
V'(vr*(6i),6') if 7r*((9) > else ilj{<d,9). Then it follows that 7r^,„ = 7r*(i9^,„) when 
TTs^n > 0, and the maximum value of (A.8) under the restriction Fg{A) > 
Pn{A) is, as stated in the second item, 

PnlAlog{fe/n{A))+i;{7r*{9),9). □ 

The proof of Theorem 2.1 is based on Lemma A. 2. Let be a positive 
o"-finite measure on (5^ such that the function c on defined by c{9) = 
J exp{J2f^i9jXj}iJ,{dx) is not identically +00. c is the so-called Laplace 
transform of /i. Its domain is the set G M'^ : c{9) < +00}. 

Lemma A. 2 (Theorem 7.1 in [1]). Let K = logc be the logarithm of the 
Laplace transform of ^. Then k is a closed convex function on W and is 
strictly convex on its domain, provided fi is not concentrated on an affine 
suh space ofW. 

Proof of Theorem 2.1. We will employ the canonical form of (2.3), 
obtained by a re-parameterization and the absorption of h into A, leading 
to 

(A.9) fe{x) = C{9)exv\^9,T,{x)^. 

The expression within the brackets in (2.6) is the logarithm of a density 
function, say, qq^^, with respect to the cr-finite measure \\a + 5{c}-, 

(A.IO) ge,Ax) ■= /a(^)(1 - vr)/e(x) + /{e}(x)[(l - ^)(1 - M^)) + tt]. 

It is straightforward to obtain the following exponential expression for 
ge^n, whenever the condition 1 — (1 — Tr)¥g{A) > holds [or in an equivalent 
way, whenever — vr < ¥g{A'^)/¥Q{A), allowing even negative values of vr]: 
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= (1 - (1 - ,)P„(^)) expj/, ,,T, + log ,_';-_tX'A) )}- 

easily seen within the class of exponential distributions, if we add the pa- 
rameter 

leading to the density functions with respect to + "^{c}' 
(A.12) he,e,+,{x) := exp|^^0,(r,(x)/A(3;)) + 

The hypothesis requiring that the T's do not satisfy a linear constraint on 
A implies that ri,T2, . . . ,T(i,lA also do not satisfy such a linear constraint 
on A. This allows us, by Lemma A. 2, to guarantee that —logD(9,6d+i) is 
a closed strictly concave function on its domain. 

On the other hand, from (A. 11), the restriction vr > can be written as 

(A.13) ed+i + log(^J exp!^J29jTj{x)Y*{dx)^ < 0, 

where the term Ia'^ is included in a new measure A* = A|yic. 

Once more by Lemma A. 2, the function on the left of (A.13) is a convex 
function. Therefore, the restricted set defined by (A.13) is a convex set. 

Let P be a probability distribution verifying the hypotheses. The function 

d 

Plog/ie,e,+i = E ^J^TjIa) + 0d+iniA) - iogDie, Od+i) 
i=i 

is then a strictly concave function on its domain, so, if any, it has a unique 
maximum point (^f , . . . , ^d+i) restricted (convex) set (A.13). 

The relation between {9,9d+i) and {9, it) given by (A. 11) would give now 
the only (if any) maximum point of PL^ = Plogge.vr under vr > 0. 

For the proof of the statements related to the MLE(t), note that by re- 
sorting to the canonical form of the exponential family and absorbing Ia 
into the measure A, from Lemma A. 2, it is straightforward that the function 

d 

\ogC{9)+Y,9j¥Tj-\og¥e{A) 
i=i 

is a strictly concave function of 0, thus, the results are immediate. □ 



The next lemma is easily deduced from Theorem 1 in [6]. 
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Lemma A. 3. Let {no,T,o) G 0. Given ro > 0, let 

%o,So)('^o) := {£ (pt, r) : P(^„,So) r)] > P(^o,So) (/^o, ^o, ro)]}. 

Then, the volume of £{fio,T,Q,rQ) is minimal in E^^^^ ^^^^(ro). 

Proof of Proposition 2.10. Because of the regularity of the model, 
the map 6 :— > P9(£^((//0; ^0)5 r)) is continuously differentiable in a neighbor- 
hood of so it suffices to show that for every fixed value of (= 
the function has a local maximum at 11 = hq and H = Sq/^q- 

Let III and Si with = |So|. Because of the elliptical character of the 
model, we have that P(^,j,^£,j)(<S((^o, Sq), r)) = F(^,j^2i)('?((/"i, ^i), r)). Thus, 
if we assume that 

F{/.o,So)('?((/^o,So),r))<P(^,,Si)(^((/^o,So),r)), 

then the absolute continuity of P(^ti,Ei) implies that, for some r* < r, 

Then, the volume of the ellipsoid Sq), r*) would be strictly lower than 

Si), r) with the same probability, contradicting Lemma A. 3. □ 

Lemmas A. 4 and A. 5 include some well-known properties, and are stated 
for reference. 

Lemma A. 4. IfF belongs to the GEM given by an elliptical family and 
A is the MVEofF, then: 

1. ¥{A) = 1/2, and, 

2. if A = £{fi,J:,r), thenlime^o+F{£{fi,J:,r + e)) = 1/2. 

The next lemma follows from the well-known fact that the class 
C ■.= {{x£W:\{x- fi,v)\ <d}:n£M.P,vG S{0,1) and (i>0}, 
and the class of all ellipsoids constitute two Vapnik-Cervonenkis (VC) classes. 

Lemma A. 5. Let {Xn}n be a random sample taken from a probability 
distribution ¥, then: 

1. sup{|P„(^) - P(^)| : A is an ellipsoid } ^ 0, a.s. 

2. sup{\Pn{AnB) -F{AnB)\:A is an ellipsoid and B eC} ^0, a.s. 
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Lemma A. 6. Let g be a decreasing function which defines an elliptical 
family on W. Let P = (1 - 7ro)Peo + t^oQ, < ttq < 1/2 and Oq £ 9, be a 
distribution in the GEM of the elliptical family defined by g. 

Let A = £{ijLQ,TiQ^r{)) be the MVE of¥. Then, for every r]>0, there exist 
d> and e > 0, such that, if we denote = <f (/i, S, r + e), then 

sup sup ¥[A^ n {x eMF -.lix - fi,v)\ < d}] < rj. 

i;G5(0,l)/iGlRP 

Proof. Let r/ > 0. Since Pgp is absolutely continuous, it is easy to show 
that there exists d > such that 

sup sup P[{x G M^: |(x - /i,^)! < d}] < r?. 

«e5(0,l)A'GKJ' 

This and (2) in Lemma A. 4 give the result. □ 

Lemma A. 7. Assume the hypotheses of Theorem 3.2. Let 5\,...,6'^ be 
the eigenvalues of Tin- i^ei A„ = inf{(5^, . . . Then 

liminf A„ > 0, a.s. 

n 

Proof. We will only treat the case p > 1. Let 7 > p/2 be the value 
associated by Gp to g. Obviously, 27 — p > and, then, there exists > 
such that 27(1 — 2rj) > p. 

From Lemma A. 6, there exist e > and d > such that 

sup sup P[A^ n {x G : I (x - u) I < d}] < rj. 

Taking into account Proposition 3.1, Lemmas A. 5 and A. 4 and that 
/yie log(/£)Q) is P-integrable, we have that there exists a probability one set 
such that if u; G r^o, then 

(A. 14) P„"K]^P(A) = l/2. 

There exists G N (iV = N{io)) such that if n > N, then A^ C A" and, 
(A. 15) sup supF[Ann{xeRP:\{x- fM,v)\<d}]<r], 

v&S (0,1) lieRP 

and 

(A.16) log(/eJ - PlAlog(/e„). 

Let to £0,Q. to will remain fixed and will be omitted in the notation. 

Statement (A. 14) implies that requirements on m in Proposition 2.5 hold 
from an index onward, and then the MLE(s) exists from this index onward. 

Let us denote fin = {fi^, . . . , n"^). Let jn be such that A„ = Let us 
assume that there exists a subsequence such that lim/j A„j, = 0. To simplify 
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the notation, we will denote this subsequence with the same notation as the 
original one. Let us consider the set 

:= {x = {x\...,xP)eW: (x^" - fi^f > d'^}. 

From (A. 15) we have that Pn[Bn H An] > P„(^„) — t] and now the proof 
goes by repeating the same steps as in Proposition 2.4 with the set Ik being 
replaced there by the set Bn here, because we have 



i=l 

(A. 17) < (A„)~'^^/^^"-^"^^"^5(0)"''(i~^'''"^"''"'''^"-'~^-*(A„)'''"*^"^"'^'^"^~^) 

= ((A„)^'^(^"^^")"'')~P'^"(^")/^5(0)''(i~^'^^^"^^"^~''^)". □ 

Proof of Theorem 3.2. Let us denote 6s^n = {t^u, f^n^^n) and let An 
be an empirical MVE. First, we will prove that the sequence {(tt^, fin, Sn)}n 
is a.s. included in a compact subset of [0, 1) x G. To this, let VIq be a proba- 
bility one set whose points satisfy 1 in Lemma A.S, Proposition 3.1, (A. 14), 
(A. 16) and Lemma A. 7, and let u; G Oq. This point will remain fixed through- 
out the proof and we will make no reference to it in the notation. 

The second term in (2.8) does not depend on vr and the third one is 
bounded. Since (7r„,/i„,5]„) maximizes (2.8) and the first term converges to 
— oo if 7r„ — > 1, it may not happen that limsupTTn = 1. 

Let (5^, . . . , (5P be the eigenvalues of S„. Let A" = sup((5^, . . . , 5*^) . Follow- 
ing the same steps as in Proposition 2.4, we would prove that if it were 
limsup A" = oo, then there would exist a subsequence such that 



1=1 

and, for this sequence, the second term in (2.8) would converge to — oo, 
which is impossible because (A. 16) is satisfied. 

Therefore, the sequence {(vTn, Sri)}n is a.s. included in a compact sub- 
set of [0, 1) X 0. Let us consider a convergent subsequence {(vr^j., 5]„j.)}fc 
with limit (vr*, /x*, S*). 

Lemma A. 4 and (A. 14) trivially give that 

(A.18) P„,(^„,) log(l - Tin,) ^ IP(^) log(l - TT*). 

Proposition 3.1 implies that if we denote A = i5(^,S,r), then for every 
e > there exists N such that, if k> N, then 

(A.19) £{^i, S, r - e) C An, C f (/x, S, r + e). 
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However, if we denote (9„^ = 6* = (;U*,S*), (2.12) implies that 

liniP^ {£{11, S, r')) = Pe. {£{^i, S, r')), 

k 

where G {r — e, r + e}. From here, (A. 19) and the continuity of Pg*, we 
obtain that 

limP^^^_(yl5^J=P,.(A'=). 

This, with (A. 14), gives that 

Urn (A^ J log((l - 7r„ JP^^^^ (A^ J + vr„ J 

(A.20) 

= P(A^)log((l-7r*)Pe.(A^)+7r*). 
The continuity of g implies that lim^ = fg* . Moreover, the sequence 

^k 

{f§ }k is uniformly bounded by a constant because the sequence {T,nk}k is 

contained in a compact subset of and g is bounded by ^(O). Thus, taking 
into account that 1 in Lemma A. 5 imphes that the sequence of distributions 
{-fnfejfe converges in distribution to P, and that A is a continuity set of P, 
it is a standard exercise to prove that 

P„,(/a„^. log(/,„J) ^ P(/Alog(/,0). 

This, (A. 18) and (A.20) give that 

(A-21) hmP„,L^,^^^^^„^/^,^^=PL^.^^,/^. 

On the other hand, from the assumptions on Qq, it can be deduced that 
(A.22) hmP„,L^ ^ =PL^ ^ 

But, from Proposition 2.3, we obtain that P-^>g, < "^-^lo tto/A' 
definition of the smart estimate, we also have that 

lim Lg^j < lim Pn^ ^e^^ /a„^ ■ 

This, (A.21), (A.22) and the inequality (2.11) imply that 0^ = 9*. 

The consistency of the MLE(c) under the elliptical model can be proved 
with the same scheme by considering the easier case tt = 0. 

The proof for the MLE(r) follows the same steps, once we show that the 
dependence on a of the restrictions given by does not constitute any con- 
straint from the asymptotic point of view. This is proved in Proposition 3.2. 
□ 



Proof of Proposition 3.2. Let 5 e (0, 1/2-a). From Proposition 3.1 
and the Glivenko-Cantelli property of the class of ellipsoids, we deduce that, 



30 



J. A. CUESTA-ALBERTOS, C. MATRAN AND A. MAYO-ISCAR 



for n> uq large enough (and depending on the sample), Fg^{An) > a + rj 
holds. On the other hand, (2.12) shows that for e > there exists 6 > such 
that sup^g^p |P6i(-B) — P0o(-^)l < e whenever H^ — 0o|| < (5. Both relations give 
that ¥g{An) > a, so that 9 € 9^^, if \\9 - 9o\\ < 6 and n > uq. □ 

Now we will adapt Section 3.2.4 in [26] to our semiparametric setup. We 
only include some keys for the adapted proofs which verbatim would repeat 
the arguments there. 

Theorem A.l (Extension of Theorem 3.2.16 in [26]). Let {M„}„ be 
stochastic processes, all of them indexed by the same product Q x K of an 
open subset Q and a compact subset K of two Euclidean spaces, and M : 
Q X K be a deterministic function. 

Assume that for every j £ K the function 9 — > M(6, 7) has a unique max- 
imum 9q where it is twice continuously differentiable w.r.t. 9, with nonsin- 
gular continuous (w.r.t. ^) second derivative matrix V{'^). Suppose also that 

) — M{9n,"fn)) — V^{Mn{9o,Jn) — M{9o,Jn)) 
= {9n - 9ofZn{9n.ln) + 0*p{\\9n " ^of ) 
+ O*p{\\0n - 94 + V^\\9n - ^of + 

for every sequence 9n = 9q + o*p{l), every sequence {7n} C K, and a uni- 
formly tight sequence Zn{9n,'~fn) of random vectors. 

If the sequence 9n{'Jn) converges in outer probability to 9q and satisfies 

Mn{9n{ln):ln) >SUpM„(0,7„) - Op(n"^), 

e 

for every n E N, then V^(^„(7„) - ^o) = -(V^(7n))"^^n(4(7n),7n) + Op(l)- 
Proof. For every sequence hn = Op(l), the hypotheses yield 

M„(0O + /ln,7n) - M„(0o,7n) 

(A.23) = \hy{^n)hn + n~^'^h'^Zn{9Q + /i„,7„) 

+ o^(||/i„,f + (^)~^||/i„|| +n-^). 

Take hn = Oniln) — 9o, and follow the proof in [26], taking into account 
that the term h'nV{'^n)h ji on the right-hand side is bounded above by c||/in|| 
for some c> 0. This holds because, on the contrary, there should exist a se- 
quence {7n} C K such that the corresponding sequence of minimum eigen- 
values of V{'jn) would converge to 0. Then, for some convergent subsequence 
to some J £ K, the continuity of V would give the contradiction of the 
singularity of V{'y). [The same argument based on the compactness of K 
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makes it possible to guarantee that the eigenvalues of l^(7n) are bounded, 
so that -n"^/2(y(7„))"^Z„(6'„(7„),7„) is Op(n"^/^), and, hence, to apply 

(A.23) also to K = -n'''^/'^{V{jn))~^Zni9ni7n),Jn) analogously to the orig- 
inal proof.] □ 

Now let be the parameter space in the elliptical model and let K be 
a compact subset of T. Let mg^^ be a real function, consider the sample 
probability, P„, corresponding to n i.i.d. observations from P, Mn{9,j) = 
Pnfng^^ and M{6,^) = Fmg^^, as well as the empirical process Gnmg^ = 
\/n(M„(0,7) — M(6,j)). The differentiability involved in the preceding the- 
orem can be guaranteed through the condition required in Lemma A. 8. 

Lemma A. 8 (Extension of Lemma 3.2.19 in [26]). Suppose that for every 
7 in the compact set K , there exists a vector-valued function mg^^^^ such that, 
for some 5 > 0, 



f mg^ - mg^^ -{9 - Oofm^g^ 



'oil <'^, l&K 





is T-Donsker and that, uniformly for 7 G K , 

P(me^ - mg^^ - {6 - Oofnig^/f^f = o{\\e - 9q\\'^). 
Then, if 9n = do + o*p{l), we have that, uniformly in 7, 

(A.24) 

= {9n - 9ofGnmg,^ + oUWOn " ^ojl + V^IK " ^o||' + n'^/^^. 

Proof. It suffices to adapt the proof in [26] to the function 

/:£°°(edxK)x(edxK)^M'^ 

given by f{z, (6*, 7)) = z{9,^) {Qd := {\\9 — 9q\\ < 5}) and the stochastic pro- 
cesses 

7 //) ^ ^ mg^ - mgp^ - {9- 6'o)^meo^ 

^n.lf7,7j = ^-n , 

W" ~ "oil 

which, from the hypotheses, converge in £°°(Grf x K) to a tight Gaussian 
process Z. □ 



Proof of Lemma 3.1. In order to simplify the notation, given x ^MP 
and 9 = {jj, S), let us denote xg = {x — — //). 

From the continuity and the nonincreasing character of g, we deduce 
that there exist K, a compact neighborhood of 70, 6 > 0, and an ellipsoid 
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such that U^ei^ -^(t) ^ '?(7*) and, if := G 6 : ||0 - ^oll <^}, then 
infxG£(7*),fGyi5'(a;eo) >0. 

Since the set {xq:9 G 05,x G £{i*)} is bounded, the second statement 
fohows from the derivabihty w.r.t. 9, leading to the Frechet derivabihty in 
LP' . Also, note that the hypothesis on the continuity of the second derivative 
implies that g' is Lipschitz in its effective domain. 

Thus, the components of rag^ are easily seen to be 

/jg(^) - f^)9'{ye) dy 2S-^(x - ^J)g'{xe) 

Ii£^^^ 9{ye) dy g{xg) 

for the derivative w.r.t. /i, while those corresponding to S are 

4(7) ^^'^(^ -f^)iy- f^V^'^g'jye) dy 2^^\x - fi){x - fifj:^^g'{xe) 
Iisi.,) aiye) dy 

To verify that (3.5) is P-Donsker, we only need to give some steps concern- 
ing the permanence of the Donsker property as developed in Section 2.10 in 

Table 1 

Asymptotic efficiencies to estimate an element of 



Dimension 







p = 2 


p = 3 


p = 5 


p = 10 


p = 30 


Gaussian 


MLE(c) 


0.1531 


0.2032 


0.2613 


0.3263 


0.3984 




MLE(c)o.25 


0.4049 


0.4658 


0.5301 


0.5991 


0.6627 




MLE(c)o.io 


0.6675 


0.7184 


0.7650 


0.8085 


0.8503 




MLE(c)o.o25 


0.8821 


0.9040 


0.9242 


0.9414 


0.9579 


tl 


MLE(c) 


0.5147 


0.5933 


0.6414 


0.6434 


0.5975 




MLE(c)o.25 


0.8037 


0.8374 


0.8542 


0.8492 


0.8198 




MLE(c)o.io 


0.9387 


0.9482 


0.9530 


0.9490 


0.9350 




MLE(c)o.o25 


0.9884 


0.9900 


0.9906 


0.9896 


0.9855 


ts 


MLE(c) 


0.2889 


0.3780 


0.4713 


0.5481 


0.5737 




MLE(c)o.25 


0.6132 


0.6839 


0.7465 


0.7931 


0.8047 




MLE(c)o.io 


0.8450 


0.8792 


0.9070 


0.9249 


0.9280 




MLE(c)o.o25 


0.9636 


0.9724 


0.9793 


0.9833 


0.9839 


ts 


MLE(c) 


0.2469 


0.3296 


0.4193 


0.5092 


0.5583 




MLE(c)o.25 


0.5586 


0.6339 


0.7067 


0.7657 


0.7953 




MLE(c)o.io 


0.8083 


0.8495 


0.8850 


0.9133 


0.9240 




MLE(c)o.o25 


0.9511 


0.9631 


0.9729 


0.9800 


0.9826 


tl5 


MLE(c) 


0.2082 


0.2805 


0.3640 


0.4544 


0.5306 




MLE(c)o.25 


0.5025 


0.5769 


0.6535 


0.7249 


0.7768 




MLE(c)o.io 


0.7642 


0.8108 


0.8535 


0.8912 


0.9157 




MLE(c)o.o25 


0.9337 


0.9486 


0.9627 


0.9734 


0.9801 



ROBUST ESTIMATION IN THE ELLIPTICAL MODEL 



33 



Table 2 

Asymptotic efficiencies to estimate a diagonal element of E 



Dimension 







p = 2 


p = 3 


p = 5 


p = 10 


p = 30 


Gaussian 


MLE(t) 


0.0266 


0.0521 


0.1023 


0.1790 


0.3000 




MLEl't'ln 9^ 


0.1375 


2059 


2955 


0.4184 


5593 




MT.Rf'tln in 


0.3594 


0.4457 


5508 


0.6657 


0.7744 




MLEftln no^ 


0.6673 


0.7364 


0.8072 


0.8674 


0.9273 




MLE(c) 


0.2666 


0.2293 


0.2161 


0.2392 


0.3206 




MLEfcin ■?'■-, 


0.4560 


0.4217 


0.4248 


0.4813 


0.5793 




MLEfc)n in 


0.6551 


0.6321 


0.6534 


0.7128 


0.7894 




MLEfcln me; 


0.8408 


0.8435 


0.8614 


0.8918 


0.9336 


ti 


MLE(t) 


0.2004 


0.2990 


0.3941 


0.4597 


0.4938 




MLEftin 9"-, 


0.4941 


0.5976 


0.6778 


0.7244 


0.7457 




MLEftln in 


0.7351 


0.8126 


0.8599 


0.8879 


0.8968 






0.8778 


0.9334 


0.9619 


0.9712 


0.9747 




MLE(c) 


0.4255 


0.3736 


0.4085 


0.4611 


0.4938 




MLEfcin ->•■-. 


0.6619 


0.6507 


0.6873 


0.7251 


0.7458 




MLEfc)n in 


0.8486 


0.8480 


0.8667 


0.8884 


0.8968 




MLEfcln 09=; 


0.9588 


0.9593 


0.9661 


0.9715 


0.9747 


ts 


MLE(t) 


0.0786 


0.1492 


0.2512 


0.3749 


0.4664 




MLEl't'ln 9^ 


3028 


0.4134 


0.5381 


0.6518 


0.7279 




MLE(t)n in 


0.5914 


0.6877 


0.7773 


0.8498 


0.8903 




MLEftln 09"; 


0.8415 


0.8890 


0.9282 


0.9571 


0.9690 




MLE(c) 


0.3661 


0.3064 


0.3118 


0.3865 


0.4670 




MLE(c)o.25 


0.5749 


0.5474 


0.5854 


0.6607 


0.7283 




MLE(c)o.io 


0.7736 


0.7739 


0.8074 


0.8549 


0.8906 




MLE(c)o.o25 


0.9269 


0.9301 


0.9433 


0.9597 


0.9692 


ts 


MLE(t) 


0.0609 


0.1182 


0.2116 


0.3347 


0.4502 




MLE(t)o.25 


0.2552 


0.3611 


0.4883 


0.6199 


0.7199 




MLE(t)o.in 


0.5411 


0.6430 


0.7429 


0.8248 


0.8847 




MLE(t)o.o25 


0.8195 


0.8701 


0.9144 


0.9500 


0.9683 




MLE(c) 


0.3437 


0.2881 


0.2874 


0.3545 


0.4514 




MLE(c)o.25 


0.5481 


0.5175 


0.5517 


0.6347 


0.7207 




MLE(c)o.in 


0.7488 


0.7465 


0.7816 


0.8342 


0.8852 




MLE(c)o.o25 


0.9131 


0.9173 


0.9327 


0.9541 


0.9685 


tl5 


MLE(t) 


0.0458 


0.0918 


0.1717 


0.2908 


0.4302 




MLE(t)o.25 


0.2089 


0.3038 


0.4288 


0.5684 


0.6981 




MLE(t)o.io 


0.4796 


0.5785 


0.6890 


0.7962 


0.8735 




MLE(t)o.o25 


0.7771 


0.8400 


0.8961 


0.9364 


0.9669 




MLE(c) 


0.3168 


0.2685 


0.2630 


0.3205 


0.4331 




MLE(c)o.25 


0.5177 


0.4855 


0.5118 


0.5931 


0.7003 




MLE(c)o.in 


0.7188 


0.7098 


0.7452 


0.8106 


0.8746 




MLE(c)o.o25 


0.8931 


0.8988 


0.9189 


0.9433 


0.9673 
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Table 3 

Asymptotic efficiencies to estimate an off-diagonal element of S 



Dimension 







p = 2 


p = 3 


p = 5 


p = 10 


p = 30 


Gaussian 


MLE(t) 


0.0332 


0.0631 


0.1130 


0.1929 


0.3030 






0.1621 


0.2323 


0.3247 


0.4361 


0.5718 




MT.Rf'tln in 


0.4070 


0.4874 


0.5854 


6890 


0.7872 




MLEftln no^ 


0.7151 


0.7716 


0.8333 


0.8805 


0.9315 




MLE(c) 


0.0332 


0.0631 


0.1130 


0.1929 


0.3030 




MTjEfrln or 


0.1621 


0.2323 


0.3247 


0.4361 


0.5718 




MLEfc)n in 


0.4070 


0.4874 


0.5854 


0.6890 


0.7872 




MLEfcln me; 


0.7151 


0.7716 


0.8333 


0.8805 


0.9315 


ti 


MLE(t) 


0.0581 


0.0997 


0.1540 


0.2064 


0.2371 




MLEftin 9"-, 


0.1470 


0.2023 


0.2655 


0.3219 


0.3598 




MLEftln in 


0.2202 


0.2757 


0.3390 


0.3962 


0.4331 






0.2646 


0.3178 


0.3784 


0.4319 


0.4701 




MLE(c) 


0.7773 


0.7684 


0.7634 


0.7598 


0.7553 




MLEfcin ->•■-. 


0.8665 


0.8697 


0.8726 


0.8762 


0.8782 




MLEfc)n in 


0.9401 


0.9434 


0.9468 


0.9497 


0.9505 




MLEfcln 09=; 


0.9836 


0.9852 


0.9861 


0.9871 


0.9876 


ts 


MLE(t) 


0.0301 


0.0572 


0.1021 


0.1643 


0.2244 




MIjEl't'ln 9^ 


0.1119 


0.1584 


0.2173 


0.2878 


0.3479 




MLE(t)n in 


0.2164 


0.2598 


0.3122 


0.3743 


0.4271 




MLEftln 09"; 


0.3003 


0.3332 


0.3721 


0.4217 


0.4659 




MLE(c) 


0.6862 


0.6922 


0.7047 


0.7263 


0.7444 




MLE(c)o.25 


0.7691 


0.7927 


0.8198 


0.8495 


0.8687 




MLE(c)o.io 


0.8728 


0.8950 


0.9166 


0.9341 


0.9472 




MLE(c)o.o25 


0.9578 


0.9674 


0.9756 


0.9820 


0.9866 


ts 


MLE(t) 


0.0261 


0.0493 


0.0899 


0.1497 


0.2194 




MLE(t)o.25 


0.1047 


0.1488 


0.2053 


0.2753 


0.3420 




MLE(t)o.in 


0.2166 


0.2594 


0.3074 


0.3659 


0.4215 




MLE(t)o.o25 


0.3181 


0.3451 


0.3788 


0.4206 


0.4648 




MLE(c) 


0.6518 


0.6625 


0.6809 


0.7093 


0.7387 




MLE(c)o.25 


0.7311 


0.7610 


0.7966 


0.8342 


0.8631 




MLE(c)o.in 


0.8422 


0.8718 


0.9000 


0.9250 


0.9426 




MLE(c)o.o25 


0.9446 


0.9571 


0.9695 


0.9791 


0.9854 


tl5 


MLE(t) 


0.0220 


0.0424 


0.0775 


0.1317 


0.2068 




MLE(t)o.25 


0.0967 


0.1379 


0.1913 


0.2589 


0.3321 




MLE(t)o.io 


0.2146 


0.2564 


0.3061 


0.3587 


0.4157 




MLE(t)o.o25 


0.3348 


0.3607 


0.3884 


0.4230 


0.4599 




MLE(c) 


0.6097 


0.6236 


0.6484 


0.6827 


0.7287 




MLE(c)o.25 


0.6841 


0.7191 


0.7617 


0.8113 


0.8541 




MLE(c)o.in 


0.8015 


0.8371 


0.8756 


0.9108 


0.9383 




MLE(c)o.o25 


0.9227 


0.9413 


0.9589 


0.9728 


0.9832 
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[26], starting from the Lipschitz property of 5', that with Theorem 2.10.6 
there leads to the Donsker property of the class {g'{xe) : \\9 — 9o\\ < 6}. 

The uniform (below or upper) bounds on the compact set ^(7*) contain- 
ing the ellipsoids in the class permit us to apply the properties in Exam- 
ples 2.10.8 and 2.10.9 in [26] and conclude Donsker's property of the class 
in (3.5). □ 

APPENDIX B: ASYMPTOTIC EFFICIENCY 

Tables 1-3 show the efficiency of the proposed estimators in the estimation 
of an element of fj,, and arbitrary diagonal and off-diagonal elements of S in 
several dimensions, for the multivariate Gaussian and some t distributions. 

We analyze the MLE(c) and MLE(t) and the estimators based on enlarged 
versions of the MVE to cover 1 — a of the theoretical probability [MLE(c)q, 
or MLE(t)Q,]. This assures maximum BP of our equivariant estimators. 

When estimating the components of the efficiencies of the truncated 
and censored estimates coincide, and we only show those of the censured 
one. 

The efficiencies have been computed comparing the values of the asymp- 
totic variances (in Theorem 3.3) with the Cramer-Rao bound. The involved 
integrals have been computed by the Monte Carlo method with 500,000 
repetitions. 
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